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A  CONTINUING  QUANDARY  FOR  MOLAR  MODELS  OF  OPERANT  BEHAVIOR 

Gregory  Galbicka,  Mary  A.  Kautz.  and  Traci  Jagers 

WALTER  REED  ARMY  INSITTI  TE  OF  RESEARCH 


The  number  ol  responses  rats  made  in  a  '  run  ul  conset  uiive  leli-levcr  presses,  prior  to  a  trial-ending 
right-lever  press,  was  ditrerentiatetF  using  .t  t.irgeied  percentile  ttroecriure.  L'nder  the  nonditreremial 
baseline,  reinlorcement  was  provided  with  .t  proDabilitv  ol  .it  the  end  ol  ;i  trial,  irrespective  ol  the 
run  on  that  trial.  Most  ol  the  .30  subieets  m.ioe  short  runs  under  the.se  conditions,  with  the  mean  lor 
the  group  around  three.  .A  targeted  pcrcenttie  m  hedule  was  ne.xt  used  to  dilierentiate  run  length 
around  the  target  \  alue  ol  12.  The  current  run  w.ts  remloreed  it  it  was  nearer  the  target  than  07% 
of  those  runs  in  the  last  24  trials  that  were  on  ttie  same  side  ot  the  target  as  the  current  run.  Programming 
reinforcement  in  this  wav  held  overall  rcinfon  einent  probabilitv  per  trial  constant  at  ,3.3  while  providing 
reinforcement  dillerentiallv  with  respect  t<>  runs  more  cioseiv  .ipproximating  the  target  of  12.  The 
mean  run  for  the  group  under  this  procedure  increased  to  approximatelv  10  Runs  approaching  the 
target  length  were  acquired  even  though  (iiiierenii.ited  responning  produced  the  same  probabilitv  of 
reinforcement  per  trial,  decreased  the  probabilu'  “i  reinlorcement  per  cwsponac.  did  not  increase  overall 
reinforcement  rate,  .ind  generallv  subsi.mii.i:  ■  •'■ouci-fl  ji  ■  i  .•  ;;i  univ  .i  lew  instances  riul  response 

rate  increase  sullicientiv  to  (ompensate  lor  iia-  iiu  re.ise  m  liie  numher  ol  responses  per  trial).  .Models 
ot  behavior  predicated  solely  on  molar  reinion  emeiit  contingencies  all  predict  that  runs  should  remain 
short  throughout  this  experiment,  because  sucii  runs  promote  hotb  the  most  irequent  reinforcement 
and  the  greatest  reinforcement  per  press.  To  liie  ((uitrarv.  2'>  ol  30  subjects  emitted  runs  in  the  vicinity 
ot  the  target,  driving  down  reitilorcemem  i.ce  wmle  greativ  increasing  ihc  luiinDer  ol  presses  per 
pellet.  These  results  illustrate  the  powerlul  etfccts  ol  local  reinlorcement  contingencies  in  changing 
behavior,  and  in  doing  so  underscore  a  need  for  more  dynamic  quantitative  formulations  of  operant 
behavior  to  supplement  or  supplant  the  currently  prevalent  static  ones. 

Key  words:  percentile  schedules,  molecular  analyses,  response  difTereniiation,  run  length,  response 
acquisition,  response  number,  reinforcement  probability,  lever  press,  rats 


Quantitative  models  of  respondent  (Pavlov- 
ian)  conditioning  have  achieved  a  fair  degree 
of  success  predicting  trial-by-trial  changes  in 
responding  (e.g.,  Rescorla  &  Wagner,  1972). 
Models  of  operant  conditioning,  on  the  other 
hand,  have  in  general  been  silent  with  respect 
to  response  acquisition,  concentrating  instead 
on  the  order  seen  globally  in  response  and  time 
allocation  of  steady-state  behavior  as  a  func- 
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lion  of  relative  reinforcement  density  (e.g., 
Davison  &  McCarthy,  1988).  The  analysis  of 
operant  acquisition  is  at  somewhat  of  a  com¬ 
parative  disadvantage,  because  those  studying 
Pavlovian  conditioning  wield  almost  complete 
control  over  all  experimentally  relevant  stim¬ 
uli,  but  those  studying  operant  conditioning 
traditionally  surrender  a  degree  of  freedom  to 
the  subject  by  programming  reinforcement 
contingent  on  behavior.  As  a  result,  the  ex¬ 
perimenter  is  incapable  of  precisely  controlling 
the  relation  between  behavior  and  environ¬ 
mental  consequences,  because  the  “free  oper¬ 
ant”  is  exactly  that — free  to  vary  from  place 
to  place,  time  to  time,  and  subject  to  subject. 
This  variation  seemingly  denies  systematic 
analysis  of  the  action  of  reinforcement  at  a  local 
level.  Skinner  (1966),  for  example,  noted  that 
a  learning  curve  “merely  describes  the  rather 
crude  overall  effects  of  adventitious  contingen¬ 
cies,  and  it  often  tells  us  more  about  the  ap¬ 
paratus  or  procedure  than  about  the  organ¬ 
ism”  (p.  17). 

Seven  years  after  Skinner’s  (1966)  pro¬ 
nouncement,  John  Platt  developed  the  first  in 
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a  class  of  procedures  (e.g.,  Alleman  &  Platt, 
1973;  Platt,  1973)  that  overcame  the  short¬ 
comings  noted  by  Skinner  and  allowed  a  sys¬ 
tematic  analysis  of  operant  acquisition  and  dif¬ 
ferentiation.  The  percentile  reinforcement 
schedules  he  devised  make  explicit  the  rein¬ 
forcement  contingencies  involved  in  response 
shaping  while  simultaneously  controlling  ei¬ 
ther  reinforcement  probability  or  rate,  holding 
one  constant  across  the  course  of  a  differen¬ 
tiation  within  a  single  subject  as  well  as  across 
different  subjects  and  response  dimensions  (e.g., 
Platt,  1984;  see  Galbicka,  1988.  for  a  review). 
Because  of  the  experimental  control  they  af¬ 
ford,  the  constraints  on  the  analysis  of  operant 
.acquisition  noted  by  .Skinner  (1966)  are  greatly 
attenuated,  allowing  an  experimental  analysis 
of  how  reinforcement  effects  response  acqui¬ 
sition  and  differentiation. 

The  present  study  details  some  data  from 
the  differentiation  of  response  number  in  rats 
under  targeted  percentile  schedules.  This  ar¬ 
rangement  controls  the  overall  probability  of 
reinforcement  while  differentiating  response 
values  around  a  fixed  value,  or  target.  The 
dimension  of  responding  differentiated  here 
was  the  number  of  presses  made  on  the  left 
lever  of  a  two-lever  operant  conditioning 
chamber  prior  to  a  single  press  on  the  right 
lever.  The  left-lever  pressing  on  each  trial  com¬ 
prised  a  “run,”  and  the  percentile  schedule 
differentially  reinforced  runs  approximating  a 
target  of  12.  This  differential  reinforcement 
was  arranged  by  first  determining  whether  the 
current  run  was  shorter  or  longer  than  the 
target,  and  then  comparing  it  to  all  prior  runs 
within  the  most  recent  24  trials  that  were  like¬ 
wise  shorter  (or  longer,  as  the  case  may  be) 
than  the  target.  The  reinforcement  criterion 
was  set  such  that  two  thirds  of  the  comparison 
distribution  fell  outside  the  criterion  zone,  with 
the  third  closest  to  the  target  considered  cri- 
terional  (i.e.,  the  criterional  zone  was  above 
the  67th  percentile  of  the  distribution  of  runs 
shorter  than  the  target  and  below  the  33rd 
percentile  of  the  distribution  of  runs  longer 
than  the  target).  This  established  a  fixed  prob¬ 
ability  of  reinforcement  equal  to  .33  at  all  times 
during  the  acquisition  and  maintenance  of  the 
differentiation  for  all  subjects,  independent  of 
the  absolute  values  of  runs  comprising  the  dis¬ 
tribution  at  any  particular  time. 

The  present  results  demonstrate  that  rein¬ 


forcement  generates  complex,  tightly  con¬ 
trolled  behavioral  sequences  even  when  dif¬ 
ferentiated  responding  produces  relatively  little 
change  in  overall  reinforcement  probability, 
either  leaves  unchanged  or  reduces  overall  re¬ 
inforcement  rate,  and  increases  the  number  of 
presses  emitted  per  reinforcer.  These  effects 
hold  true  at  all  levels  of  meaningful  aggre¬ 
gation — from  entire  conditions,  to  whole  ses¬ 
sions,  to  blocks  as  short  as  20  trials.  .\s  such, 
they  illustrate  that  the  relatively  static  quan¬ 
titative  formulations  of  operant  behavior  so  far 
proposed,  although  very  successfully  describ¬ 
ing  some  molar  relations  between  aggregate 
behavior  and  reinforcement,  can  at  best  predict 
endpoints  of  more  dynamic  processes  inv  olving 
local  reinforcement  contingencies.  Reinforce¬ 
ment  changes  behavior  at  a  local  level  in  such 
a  way  that  subjects  learn  to  emit  complex  pat¬ 
terns  of  behavior  that  decrease  overall  rein¬ 
forcement  density  when  doing  so  increases  the 
immediate  probability  of  food. 

METHOD 

Subjects 

Subjects  were  30  male  Sprague- Daw  ley  rats, 
fed  freely  to  350  g  and  maintained  at  that 
weight  thereafter  through  restricted  postses¬ 
sion  feeding  of  chow.  They  were  individually 
housed  in  acrylic  rack-mounted  cages  lined  with 
pine  bedding,  with  freely  available  water  in 
the  home  cage.  The  rack  was  removed  from 
the  colony  room,  which  was  maintained  on  a 
12:12  hr  light/dark  cycle  (onset  time,  6:00 
a.m.),  at  the  same  time  every  day  and  brought 
to  the  laboratory. 

Apparatus 

Sessions  were  conducted  in  five  identically 
configured  operant  conditioning  chambers 
(Coulbourn  Instruments,  Inc.).  The  instru¬ 
ment  panel  of  each  contained  two  response 
levers  mounted  symmetrically  around  an  ap¬ 
erture  (6.25  cm  by  3.5  cm)  in  which  rein¬ 
forcers,  consisting  of  a  45-mg  food  pellet 
(BioServe),  could  be  delivered  via  a  solenoid- 
operated  pellet  dispenser  mounted  behind  the 
panel.  The  levers  (Coulbourn  Instruments 
Model  E23-05  on  the  left  and  E21-03  on  the 
right)  required  between  0.15  and  0.3  N  to 
operate.  No  effort  was  made  to  standardize  the 
force  required  across  levers;  however,  each 
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subject’s  box  assignment  remained  constant,  so 
the  same  requirement  remained  in  force 
throughout  the  experiment.  Each  switch  clo¬ 
sure  also  operated  a  heavy-duty  relay  mounted 
behind  the  front  wall  above  the  food  aperture. 
.■\bove  each  lever  were  three  lights  (Sylvania 
28ESB)  mounted  flush  with  the  wall  and  cov¬ 
ered  with  a  red.  green,  or  yellow  cap.  The 
floor  of  the  chamber  consisted  of  parallel  stain¬ 
less  steel  rods  (0.5  cm  diameter)  spaced  1.8 
cm,  center  to  center.  The  chamber  was  entirely 
enclosed  within  a  light-  and  sound-attenuating 
shell.  White  noise  continuously  present  in  the 
room  helped  further  mask  extraneous  noise. 
.\  PDF®  1 1/73  minicomputer  in  an  adjacent 
room,  operating  under  the  SKEDl  1  -  (Snap¬ 
per  &  Inglis,  1985)  software  system,  pro¬ 
grammed  stimuli  and  collected  data.  The  per¬ 
centile  schedule  comparisons  and  calculations 
were  evaluated  by  a  set  of  FORTRAN  sub¬ 
routines  (available  upon  request  from  the  first 
author).  Sessions  were  also  monitored  via  Ger- 
brands  (Model  C-3SH)  cumulative  recorders. 

Procedure 

Following  magazine  training,  during  which 
pellets  were  delivered  at  random  intervals  av¬ 
eraging  30  s,  pellets  were  delivered  for  any 
approach  to  and  contact  with  either  lever.  Fol¬ 
lowing  this,  pressing  either  lever  produced  a 
pellet.  After  50  pellets,  the  procedure  changed 
such  that  a  green  light  was  illuminated  above 
one  of  the  two  levers,  randomly  selected  on 
each  trial,  and  only  presses  on  that  lever  pro¬ 
duced  a  pellet.  This  usually  required  a  short 
period  of  remedial  hand-shaping  to  move  sub¬ 
jects  from  the  preferred  to  the  nonpreferred 
lever.  After  100  presses  under  these  contin¬ 
gencies,  subjects  moved  rapidly  between  and 
pressed  both  levers.  During  the  final  pretrain¬ 
ing  condition,  trials  were  signaled  by  illumi¬ 
nating  the  houselight  and  both  green  lights.  A 
right-lever  press  following  at  least  one  left- 
lever  press  terminated  a  trial  (right-lever 
presses  prior  to  a  left-lever  press  had  no  con¬ 
sequences)  and  initiated  a  3-s  blackout.  Prob¬ 
ability  of  pellet  deliyery  following  a  trial  was 
1.0  during  the  first  33  trials,  was  .50  during 
the  next  33  trials,  and  was  subsequently  re¬ 
duced  and  maintained  at  .33  thereafter.  This 
ultimate  probability  constituted  the  nondiffer¬ 
ential  reinforcement  baseline  and  remained  in 
effect  for  at  least  1 5  sessions.  During  this  and 


all  subsequent  conditions,  sessions  were  con¬ 
ducted  5  days  per  week  and  lasted  either  100 
trials  or  30  min,  whichever  occurred  first. 

The  percentile  procedure  was  then  insti¬ 
tuted.  with  a  target  value  of  12  and  a  proba¬ 
bility  of  a  criterion  run  {m)  of  .33.  Determining 
whether  a  run  met  criterion  under  this  pro¬ 
cedure  involved  three  basic  steps.  First,  the  run 
was  compared  to  the  target  to  determine 
whether  it  was  shorter  or  longer  than  the  tar¬ 
get.  Next,  the  run  was  compared  to  all  runs 
(rom  the  most  recent  24  trials  that  were  also 
short  (or  long,  as  the  case  may  be)  of  the  target. 
The  number  of  such  comparisons  is  denoted 
rn.  Finallv.  the  run  was  considered  criterional 
if  it  was  closer  to  the  target  than  k  or  the  m 
comparison  values,  where  k  =  {m  +  1)(1  — 
=  .o7  {rn  t  1). 

The  mechanics  of  the  above  procedure  in¬ 
volved  initially  determining  the  relative  devi¬ 
ation  of  the  current  run  from  the  target  by 
subtracting  the  former  from  the  latter.  The 
first  comparison  value  in  memory  (stored  as  a 
signed  deviation  from  target,  as  well)  was  then 
multiplied  by  the  current  deviation  to  deter¬ 
mine  whether  it  was  on  the  same  side  of  the 
target  (i.e.,  if  the  product  was  negative,  the 
signs  must  be  opposite,  and  that  comparison 
was  skipped).  Deviations  of  zero  (i.e.,  runs 
equal  to  the  target)  were  arbitrarily  classed  as 
positive.  If  the  deviations  were  both  jrositive 
or  both  negative,  the  absolute  values  of  the 
current  and  the  comparison  deviation  were 
compared,  and  one  of  three  counters  was  in¬ 
cremented,  depending  on  whether  the  current 
deviation  was  closer  to,  equally  distant,  or  fur¬ 
ther  from  the  target  than  the  comparison  de¬ 
viation.  These  steps  were  then  repeated  for 
each  deviation  in  the  comparison  memory.  This 
yielded  tallies  on  each  trial  of  the  number  of 
comparisons  on  the  same  side  of  the  target  with 
deviations  larger,  equal  to,  or  smaller  than  the 
current  one.  The  sum  of  these  three  tallies 
constituted  the  number  of  comparisons  on  the 
same  side  of  the  target  (m)  for  that  trial.  The 
program  first  evaluated  whether  the  current 
run  was  strictly  closer  than  enough  compari¬ 
sons  runs  (the  first  tally)  to  exceed  k,  in  which 
case  it  was  considered  criterional.  Because  the 
expression  for  k  yields  integer  values  only  if 
m  -b  1  is  a  multiple  of  three,  and  the  current 
deviation  can  only  be  closer  to  the  target  than 
an  integer  number  of  comparisons,  k  was 
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rounded  to  the  nearest  integer.  If  the  hrst  tallv 
did  not  exceed  k,  the  number  of  equally  distant 
deviations  was  added,  and  if  this  sum  exceeded 
k,  the  run  was  considered  criterional  with  a 
random  probability  equal  to  (i.e.,  .33). 
Hence,  even  if  all  values  in  the  memorv  equaled 
the  present  one,  the  present  run  would  be  con¬ 
sidered  criterional  with  a  probabilitv  of  .33. 
Independent  of  whether  the  current  run  was 
considered  criterional,  its  signed  deviation  troni 
the  target  replaced  the  oldest  deviation  in 
memorv  at  the  end  of  each  trial  (i.e.,  the  mem¬ 
ory  alwavs  contained  the  most  recent  24  de¬ 
viations). 

Because  the  conditional  probabilitv  of  re¬ 
inforcement  for  criterional  and  noncriierional 
runs  was  1.0  and  0.0,  respectivelv,  ami  iri- 
lerional  and  noncriterional  runs  uere  iiuiiu- 
ally  exclusive,  criterional  runs  and  reinforce¬ 
ment  were  isomorphic.  Thus,  not  onlv  did  the 
overall  probability  of  a  criterional  run  remain 
controlled  at  the  experimentally  specified 
probability  of  ly  =  .33  throughout  acquisition 
and  maintenance,  so  did  the  overall  probability 
of  reinforcement. 

The  number  of  deviations  above  or  below 
the  target  in  the  comparison  distribution  varied 
across  trials  between  0  and  24.  Allowing  mem¬ 
ory  size  to  float  is  preferable  to  maintaining 
separate,  fixed-sized  memories  for  deviations 
above  and  below  the  target  because  the  latter 
strategy  can  lead  to  comparisons  to  deviations 
no  longer  characteristic  of  present  perfor¬ 
mance.  That  is,  even  if  runs  consistently  de¬ 
viated  short  of  the  target  for  hundreds  of  trials, 
the  latter  strategy  would  leave  the  memory  for 
deviations  above  the  target  untouched,  such 
that  a  run  longer  than  the  target  would  be 
evaluated  with  respect  to  this  distribution  even 
though  it  no  longer  accurately  reflected  per¬ 
formance. 

Memory  size  affects  the  operation  of  per¬ 
centile  schedules  in  two  ways.  First,  as  memory 
size  gets  small,  the  estimation  of  percentiles 
suffers.  That  is,  because  m  observations  define 
m  4-  1  intervals  into  which  the  next  run  can 
fall,  each  observation  represents  the  pth  per¬ 
centile  of  the  distribution,  where  p  =  100/(m 
+  1).  This  places  a  lower  limit  on  estimating 
critcrional-response  probabiliiv  at  p/100. 
Hence,  for  the  percentile  schedule  to  operate 
properly,  a  minimum  number  of  comparison 
observations  is  necessary  (here,  to  define  the 
33rd  percentile,  m  must  equal  two  or  more). 


•Second,  memory  size  determines  how  long  past 
Ijehavior  remains  in  the  sample  comprising  the 
estimate  of  current  behavior.  .\s  memorv  size 
increases,  more  remote  runs  contribute  to  this 
estimate.  Occasional  turnover  in  the  compar¬ 
ison  distribution  is  necessarv  to  track  anv  Ite- 
iiavior  change.  Hence,  memorv  size  must  be 
l.irge  enough  to  define  necessarv  percentiles  of 
the  distribution  accuratelv  l)ut  stnall  enough 
'o  allow  trequent  updating  of  the  estimate  of 
present  performance.  The  memorv  size  used 
iiere  varied  between  trials  from  t)  to  24.  allow¬ 
ing  a  maximum  re.solution  of  every  4th  per- 
.  entile  while  completelv  updating  four  times 
per  session. 

A  linai  procedural  \ariant  was  cmploved  in 
ill  attempt  to  shape  bcha\ior  s\mmetricall\' 
iiounn  the  target.  .\  svtnmeirv  routine  iike 
ih;u  descriiied  in  (ialbickti  ;md  Pkitt  (  198d.  [i. 
1.31)  w.as  cmploved.  in  which  the  value  of 
w;ts  adjusted  (:.■')  (le[)ending  on  how  much  tn 
diflered  from  1 2.  the  expected  number  of  com¬ 
parison  values  in  a  balanced  memory.  The 
routine  is  best  understood  by  assuming  a  bal¬ 
anced  memory  and  working  backwards.  If  the 
comparison  distribution  was  perfectly  bal¬ 
anced.  with  12  values  above  and  below  the 
target,  then  from  the  percentile  equation  k  = 
.67(13)  =  8.71,  subsequently  rounded  to  9. 
Hence,  any  deviation  closer  to  the  target  than 
the  fourth  smallest  deviation  would  meet  the 
criterion  (i.e.,  would  be  closer  than  9  other 
deviations).  The  symmetry  routine,  therefore, 
hrst  classified  any  run  as  criterional  if  there 
were  currently  fewer  than  four  comparisons 
on  the  same  side  of  the  target  (i.e.,  if  rn  <  4, 

=  1.0).  As  the  comparison  distribution  size 
increased  above  4,  was  modified  in  direct 
proportion  to  the  deviation  from  symmetry, 
such  that  w'  =  \2w/m  (i.e.,  for  4  <  m  <  12, 
I  S  <  w,  as  the  number  of  memory  values 
approached  symmetry,  w'  approached  w).  .As 
memory  size  increased  above  12  (i.e.,  the  pres¬ 
ent  run  fell  on  the  preferred  side  of  the  com¬ 
parison  distribution),  fewer  runs  than  nomi¬ 
nally  programmed  were  considered  criterional 
(i.e.,  for  m  >  12,  w'  <  w).  This  strategy  be¬ 
comes  self-defeating,  however,  as  comparison 
values  overwhelmingly  predominate  on  one  side 
of  the  target  (i.e.,  if  m  —  24,  w'  =  \/2w),  as 
they  would  early  in  acquisition.  This  adjust¬ 
ment,  therefore,  was  used  only  when  the  num¬ 
ber  of  comparisons  on  the  nonpreferred  side 
exceeded  4  (and  hence  4  <  m  S  19).  For  m 
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>  19,  the  quantity  (1  —  zv)  in  the  percentile 
equation  was  multiplified  by  2A/m.  At  the  point 
of  transition  between  these  two  algorithms, 
both  specify  lu'  =  y-\/{2  —  lo)  —  .197,  but  the 
latter  specifies  v'  approaches  iv  as  rn  ap¬ 
proaches  24,  restoring  criterional  response  (and 
reinforcement)  probability  to  the  expected 
value. 

Under  all  conditions,  the  time  of  every  stim¬ 
ulus  event  and  every  lex  er  press  was  recorded 
such  that  the  entire  session  could  be  recon¬ 
structed  to  the  nearest  0.01  s.  Data  were  sub¬ 
sequently  transferred  to  a  minicomputer  (Dig¬ 
ital  Equipment  Corporation)  for  storage  and 
analysis. 

RESULTS 

Figure  1  shows  overall  mean  run  (left  re¬ 
sponse  per  trial)  for  the  group  across  sessions 
under  the  nondifferential  baseline  and  targeted 
percentile  conditions,  as  well  as  the  mean  run 
reinforced.  The  mean  run  under  baseline  was 
generally  short  (approximately  three),  and  rel¬ 
atively  stable.  The  mean  reinforced  run  did 
not  systematically  differ  from  the  overall  mean, 
demonstrating  the  nondifferential  nature  of  the 
baseline  reinforcement  contingency.  Under  the 
percentile  schedule,  mean  run  length  increased 
rapidly,  reaching  an  asymptotic  level  of  just 
over  10  in  approximately  20  sessions.  Note 
that,  as  required  by  the  percentile  procedure, 
the  mean  reinforced  run  also  increased  steadily, 
remaining  consistently  closer  to  the  target  than 
the  mean  run  overall. 

To  provide  a  gross  measure  of  how  this 
change  in  the  group  mean  reflected  changes  in 
individual  performance.  Figure  2  presents  the 
cumulative  percentage  of  subjects  attaining 
various  acquisition  criteria  as  a  function  of 
time  under  the  percentile  schedule.  To  derive 
these  values,  every  session  was  first  divided  into 
five  20-trial  blocks,  and  then  the  entire  se¬ 
quence  was  scanned  for  25  or  50  consecutive 
blocks,  during  which  the  mean  run  for  a  par¬ 
ticular  subject  remained  at  or  above  either  50%, 
67%,  or  75%  of  target.  The  block  size  was  set 
at  20  trials  to  provide  the  minimal  aggregate 
over  which  various  other  measures  of  behavior 
and  reinforcement  could  evince  a  range  of 
meaningful  values  (i.e.,  values  that  could  po¬ 
tentially  demonstrate  substantial  variability  for 
reasons  other  than  small  sample  size).  The 
block  in  which  the  25th  (or  50th)  consecutive 


Kiij.  I  Run  Icncth  (Iclt  responses  per  trial)  for  .ill 
runs  (closed  circles)  or  remlorced  runs  only  (diamonds) 
for  the  group  across  sessions.  Points  and  vertical  bars  are 
means  ±  SEM  of  individual-subject  session  means.  V'alues 
to  the  left  of  the  vertical  dashed  line  were  obtained  under 
the  nondifTerential  reinforcement  baseline,  those  to  the 
right  under  the  targeted  percentile  schedule.  The  dashed 
horizontal  line  represents  the  target  during  the  latter. 


block  occurred  constituted  the  acquisition  block 
for  that  subject;  hence,  the  minimum  value  was 
25  (or  50).  The  fastest  subject  met  the  50% 
and  67%  criteria  shortly  after  the  minimum, 
irrespective  of  the  number  of  consecutive  blocks 
required,  and  met  the  75%  criterion  for  25 
consecutive  blocks  after  just  over  50  blocks 
(during  the  11th  session)  and  for  50  consec¬ 
utive  blocks  just  prior  to  the  100th  block.  All 
but  2  subjects  met  the  50%  criterion  for  25 
consecutive  blocks  within  100  blocks,  whereas 
80%  of  the  subjects  met  the  67%  criterion  and 
40%  met  the  strictest  criterion  for  25  consec¬ 
utive  blocks  within  the  same  period.  After  50 
sessions  (250  blocks),  just  over  70%  of  the  sub¬ 
jects  had  met  the  75%  acquisition  criterion  for 
25  consecutive  blocks.  The  required  number 
of  consecutive  blocks  interacted  with  the  per¬ 
centage  of  target  required  in  determining  the 
percentage  of  subjects  meeting  acquisition.  The 
percentage  of  subjects  attaining  the  50%  cri¬ 
terion  was  only  slightly  decreased  by  increas¬ 
ing  the  number  of  consecutive  blocks  required, 
with  over  80%  meeting  the  criterion  for  50 
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20-Trial  Block  (5/Session) 

Fig.  2.  Cumulative  percentage  of  subjects  maintaining  a  minimum  mean  run  of  50%,  67%.  or  75%  of  target  for 
either  25  (left  panel)  or  50  (right  panel)  consecutive  20-trial  blocks  (five  blocks  per  session)  as  a  function  of  consecutive 
block  number  under  the  percentile  schedule.  The  lines  increment  during  the  session  in  which  the  25th  (or  50th)  block 
occurred. 


consecutive  blocks  by  the  100th  block.  Only 
60%  of  the  subjects  maintained  run  lengths 
equal  to  or  greater  than  67%  of  the  target  for 
50  consecutive  trials  within  the  first  200  blocks, 
compared  with  over  80%  for  the  25-block  cri¬ 
terion,  whereas  the  percentage  of  subjects 
meeting  the  75%  criterion  for  50  consecutive 
blocks  was  reduced  even  more  over  its  25-block 
counterpart,  with  only  20%  meeting  criterion 
(compared  to  70%)  within  the  first  250  blocks. 

Figure  3  shows  mean  run  (overall  and  re¬ 
inforced)  across  20-trial  blocks  for  each  of  4 
subjects,  selected  to  illustrate  characteristics  of 
the  percentile  procedure  as  well  as  of  respond¬ 
ing.  Subjects  38  and  39  showed  fairly  typical 
acquisition  under  the  percentile  procedure. 
Run  length  gradually  increased  to  a  value 
slightly  lower  than  target,  during  which  time 
the  mean  run  reinforced  increased  as  well  to 
remain  longer  than  the  overall  mean.  As  run 
length  increased  above  the  target,  however,  the 
mean  reinforced  run  remained  displaced  nearer 
the  target,  such  that  it  was  now  relatively 


shorter  than  the  overall  mean  (e.g..  Subject 
39’s  data  during  Blocks  90  through  100).  Run 
length  subsequently  decreased  below  the  tar¬ 
get,  such  that  reinforced  runs  were  now  rel¬ 
atively  longer  than  the  mean,  and  the  cycle 
repeated,  with  noticeable  oscillation  in  run 
length.  For  Subject  38,  these  oscillations  ap¬ 
peared  as  almost  a  sawtooth  pattern,  whereas 
for  Subject  39  transitions  were  more  gradual 
(the  inset  in  each  panel  expands  several  cycles 
for  each  subject).  Subject  40’s  results  demon¬ 
strate  that  these  oscillations  did  not  always 
occur,  and  that  not  only  did  the  mean  rein¬ 
forced  run  increase  with  increases  in  overall 
run  length  to  the  target  value  but  it  also  de¬ 
creased  to  track  decreases  in  overall  run  length, 
both  during  the  long  sequence  between  Blocks 
25  and  50  and  during  the  single  blocks  at  ap¬ 
proximately  Blocks  175  and  220,  for  example. 
In  all  these  instances,  however,  the  mean  re¬ 
inforced  run  always  remained  closer  to  the 
target  than  the  mean  run  on  that  block,  main¬ 
taining  the  differential  reinforcement  contin- 
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20-Trial  Block  (5/Session) 

Fig.  3.  Mean  run  (left  responses  per  trial)  on  all  trials  (connected  lines)  or  reinforced  trials  only  (diamonds)  for 
4  subjects  (separate  panels)  during  consecutive  20-trial  blocks  of  baseline  (left  of  the  vertical  in  each  panel)  or  the 
percentile  schedule  (right  of  the  vertical).  The  horizontal  dashed  line  indicates  the  target  during  the  percentile  schedule. 
The  insets  in  the  panels  for  Subjects  38.  39,  and  50  expand  several  cycles  of  run-length  oscillation. 


gency.  Finally,  Subject  50’s  data  present  an 
extreme  example  of  delayed  acquisition.  Other 
than  the  extended  period  of  near-invariant 
short  runs  for  the  first  75  blocks,  however, 
there  was  little  to  distinguish  this  subject’s  data 
once  acquisition  began.  It  occurred  more  grad¬ 
ually  than  for  Subjects  38  and  39,  but  this  was 
also  true  of  other  subjects.  Note  that  through¬ 
out  the  targeted  percentile  procedure,  even  be¬ 
fore  runs  began  to  change  appreciably  for  this 
subject,  reinforcement  remained  differentially 
contingent  on  runs  closer  to  the  target,  albeit 
by  a  slender  margin. 

One  factor  that  might  influence  time  to  com¬ 
plete  acquisition  is  the  amount  of  variability 
present  in  the  baseline  run  distribution  from 
which  the  percentile  schedule  selects  criter- 
ional  runs.  An  inverse  relation  might  be  ex¬ 
pected,  such  that  less  variability  under  baseline 
would  correlate  with  more  extended  acquisi¬ 
tion.  This  expectation  was  only  partially  borne 


out  bv  the  present  data.  Table  1  shows  cor¬ 
relation  coefficients  (r)  between  the  standard 
deviation  of  runs  from  the  last  five  baseline 
sessions  for  each  subject  and  the  session  on 
which  that  subject  met  each  of  the  different 
acquisition  criteria  presented  in  Figure  1,  fur¬ 
ther  classified  by  whether  acquisition  occurred 
within  150  blocks  or  400  blocks.  Also  shown 
are  the  probabilities  by  which  each  coefficient 
differed  statistically  from  zero  (p)  and  the 
number  of  subjects  on  which  the  correlation 
was  based.  A  relatively  strong  inverse  corre¬ 
lation  was  apparent  between  run  variability 
and  time  to  acquisition  at  both  50%  criteria 
for  subjects  acquiring  by  the  150th  block.  Ex¬ 
tending  the  window  to  the  400th  block  weak¬ 
ened  both  correlations,  although  the  one  for 
the  25-block  criterion  remained  relatively  sub¬ 
stantial  {p  <  .05).  Correlations  based  on  the 
67%  criterion  were  generally  smaller  than  their 
50%  counterparts,  except  for  those  based  on 
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Table  1 

Pearson  product  moment  correlations  (r)  between  indi¬ 
vidual  subjects'  run-length  standard  deviations  during  the 
last  5  days  of  baseline  and  the  block  on  which  they  met 
the  six  different  acquisition  criteria,  along  with  the  prob¬ 
ability  that  the  coefficient  equaled  zero  (p)  and  the  number 
i)f  subjects  on  which  each  correlation  was  based  (.V).  The 
rightmost  columns  present  correlations  obtained  using  all 
subjects  that  acquired  the  differentiation  at  the  different 
levels  bv  the  400th  block,  and  the  middle  three  columns 


are  correlation.s  based  only  on  those  subjects  that  achieved 
acquisition  within  the  first  150  blocks. 

(Criterion 

.Subjects  meetine  criterion 

By  1 50th  block  Bv  400th  block 

r.irqet  Block 

r  p  \  -  p  X 

25 

-0.51 

.01 

26 

-0.44 

.02 

28 
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50 

-0.55 

.01 

26 

-0.30 

13 

26 

()7 

25 

-0.,57 

.08 

23 

0.33 

.11 

24 
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-0.66 

.04 

10 

008 

.  ~5 

P 

^  s 

25 

-0.32 

.2b 

1  3 

0.18 

44 

21 

s 

50 

0.17 

.83 

4 

0.78 

02 

8 

subjects  reaching  the  50-block  criterion  by  the 
150th  session;  these  did  achieve  statistical  sig¬ 
nificance  ip  <  .05).  Correlations  based  on  the 
75%  criterion  were  generally  insignificant,  ex¬ 
cept  for  the  correlation  based  on  subjects  reach¬ 
ing  the  50-block  criterion  within  the  larger 
window.  This  yielded  the  largest  and  only  sig¬ 
nificant  positive  correlation  coefficient  of  any 
condition  (r  =  0.78,  p  <  .05).  Hence,  it  appears 
that  baseline  variability  may  help  predict  an 
initial,  relatively  small  change  in  the  direction 
of  the  target,  but  not  the  time  to  fine-tune  a 
differentiation  around  a  particular  target  value. 
This  interpretation,  of  course,  should  be  tem¬ 
pered  by  the  small  sample  sizes  on  which  the 
significant  67%  and  75%  correlations  were 
based. 

To  provide  an  indication  of  how  different 
behavioral  measures  concurrently  changed  and 
to  present  data  for  some  additional  subjects, 
Figure  4  shows  five  different  measures  plotted 
across  20-trial  blocks  for  6  subjects  (Subject 
50’s  run-length  data  were  also  presented  in 
Figure  3).  The  measures  were  chosen  such  that 
they  could  simultaneously  be  presented  on 
semilogarithmic  axes  with  minimal  overlap. 
They  arc,  in  order  of  increasing  frequency, 
reinforcement  rate,  reinforcement  probability, 
response  rate,  run  length,  and  trial  rate.  Sub¬ 
jects  34,  43,  and  53  show  the  most  typical 


acquisition  pattern.  Imposition  of  the  targeted 
percentile  procedure  increased  run  length  rap¬ 
idly  from  a  mean  between  two  and  three  to  a 
value  that  oscillated  between  eight  and  14.  Re¬ 
inforcement  probabilitv  remained  relativelv 
constant  throughout  this  change  in  run  length. 
This  increase  in  presses  per  trial  most  often 
occurred  concomitant  with  an  increase  in  re¬ 
sponse  rate,  although  for  .Subject  34  this  rate 
increase  was  slightlv  delayed.  The  increased 
response  rate,  however,  seldom  compensated 
for  the  increase  in  the  mean  run,  such  that  the 
rate  of  trial  completion  decreased  drastically 
to  around  half  its  baseline  value.  Because  re¬ 
inforcement  probabilitv  was  experimentally 
controlled,  this  decrea.se  in  trial  rate  concom¬ 
itantly  decreased  overall  reinforcement  rate. 
.Subject  55  was  one  of  the  few  subjects  for 
whom  response  rate  increased  parallel  to  the 
increased  number  of  responses  per  trial.  kee[i- 
ing  the  rate  ol  trial  completion  (and  hence 
reinforcement)  constant.  Subject  5()'s  results 
are  again  striking  because  of  the  delay  in  ac¬ 
quisition.  Mean  run  length  was  decreasing  for 
this  subject  during  baseline,  and  imposing  the 
targeted  percentile  schedule  did  not  reverse  this 
trend,  most  immediately  resulting  in  almost 
complete  minimal  runs  on  each  trial  (i.e.,  runs 
of  one).  Response  rate  stabilized  during  this 
time  such  that  the  rate  of  trial  completion  ap¬ 
proached  30  trials  per  minute,  generating  a 
high  and  stable  reinforcement  rate  as  well. 
After  approximately  15  sessions,  and  despite 
the  existing  high  rate  of  reinforcement,  ac¬ 
quisition  finally  commenced,  and  although  re¬ 
sponse  rate  increased  substantially  during  this 
period,  trial  and  reinforcement  rates  were 
driven  down  by  almost  two  thirds  as  mean  run 
approached  the  target. 

Subject  56  was  the  only  subject  who  failed 
to  maintain  differentiated  runs  in  the  vicinity 
of  the  target.  As  run  length  increased  from 
around  three  to  about  12  after  10  sessions  un¬ 
der  the  percentile  procedure,  response  rate, 
which  was  already  relatively  high  (two  re¬ 
sponses  per  second),  increased  by  only  about 
one  third.  .As  a  result,  trial  rate  and  reinforce¬ 
ment  rate  plummeted.  During  the  next  1 5  ses¬ 
sions,  run  length  decreased,  increasing  trial 
and  reinforcement  rates.  This  was  followed  by 
a  subsequent  increase  in  run  length  for  ap¬ 
proximately  1 0  sessions,  with  a  correlated  de¬ 
crease  in  trial  and  reinforcement  rates.  There¬ 
after,  run  length  consistently  decreased  to  near 
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Fig.  4.  Trial  rate  (trials  per  2  min:  diamonds),  run  length  (left  responses  per  trial:  solid  line),  response  rate 
(responses  per  second:  squares),  reinforcement  probability  (pellets  per  trial:  dashed  line),  and  reinforcement  rate  (pellets 
per  minute:  triangles)  for  each  of  6  subjects  (individual  panels)  under  the  baseline  and  percentile  procedures  (left  and 
right  of  the  vertical  in  each  panel).  Values  represent  block  means.  Note  the  semilogarithmic  axes.  Horizontal  lines 
indicate  the  percentile  target  (upper  line)  and  the  expected  reinforcement  probability  (lower  line). 


baseline  values,  restoring  trial  and  reinforce¬ 
ment  rates  to  the  high  values  obtained  prior 
to  the  short-lived  differentiation. 

A  close  look  at  the  reinforcement  probabil¬ 
ities  in  Figure  4  reveals  a  small  but  systematic 
decrease  below  the  value  programmed,  cor¬ 
related  with  periods  when  mean  runs  were 
slightly  below  the  target.  This  decrease  was 
evident  for  Subjects  34,  43,  and  55  from  ap¬ 
proximately  Block  50,  and  for  Subject  53  from 
Block  75  onward,  except  for  the  period  be¬ 
tween  Blocks  1 50  and  200  for  Subject  43,  dur¬ 
ing  which  mean  runs  fell  even  further  below 
the  target.  For  Subject  50,  the  decrease  in  re¬ 
inforcement  probability  was  not  evident  except 
for  the  short  period  between  Block  275  and 


300,  during  which  the  mean  run  remained  very 
close  to,  but  short  of,  the  target.  For  Subject 
56,  variability  in  the  mean  run  made  detecting 
a  consistent  decrease  in  reinforcement  proba¬ 
bility  difficult;  however,  after  runs  began  to 
decrease  consistently  (approximately  Block 
225),  reinforcement  probability  became  less 
variable  and  showed  no  decrease.  These  vari¬ 
ations  from  the  nominal  probability  pro¬ 
grammed  by  the  percentile  schedule  likely  re¬ 
sulted  from  the  memory  symmetry  routine, 
which  operated  only  after  runs  longer  than  the 
target  comprised  a  portion  of  the  comparison 
distribution.  When  all  runs  fell  short  of  the 
target  early  in  acquisition,  the  routine  did  not 
operate.  Once  runs  above  the  target  were  oc- 
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Post-Nonfood  Trials 
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Deviation  From  Prior  Run 

Fig.  5.  Frequency  distributions  of  deviations  from  the  previous  run,  limited  to  those  trials  following  a  run  of 
between  8  and  16,  during  the  penultimate  session  under  baseline  (Session  -2)  and  during  Sessions  3.  10,  25,  and  50 
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casionally  emitted,  however,  criterional  re¬ 
sponse  probability  was  reduced  for  runs  on  the 
preferred  side  (below  target)  and  incremented 
for  runs  on  the  nonpreferred  side.  If  this  re¬ 
stored  the  distribution  to  symmetry,  the  re¬ 
sulting  probability  of  a  criterional  response 
would  be  the  nominal  value  {zr).  However, 
because  the  distribution  remained  asvmmet- 
ricallv  positioned  below  the  target,  most  runs 
were  selected  with  an  adjusted  prohabilitv  z.”' 
<  c.".  and  reinforcement  probability  remained 
slightly  reduced. 

To  examine  local  changes  in  runs  at  differ¬ 
ent  points  during  differentiation,  deviations 
between  successive  runs  (i.e.,  the  difference 
between  the  current  and  the  previous  run)  were 
computed  for  every  subject  during  the  penul¬ 
timate  session  under  baseline  i-2).  and  the 
3rd.  10th,  25th,  and  50th  sessions  under  the 
percentile  procedure.  Because  run  length  is 
bounded  by  a  physical  minimum  and  most 
likely  by  a  behavioral  maximum,  deviations 
between  successive  values  are  likewise  con¬ 
strained  (e.g.,  a  distribution  comprised  solely 
of  small  runs  cannot  have  large  negative  de¬ 
viations).  To  minimize  the  effects  of  these  con¬ 
straints  and  provide  a  less  biased  measure,  de¬ 
viations  were  determined  only  if  the  run  on 
the  reference  (preceding)  trial  was  between 
eight  and  16.  The  top  panel  of  Figure  5  shows 
the  frequency  of  all  deviations  for  the  group, 
and  the  bottom  two  panels  segregate  deviations 
by  whether  food  was  presented  on  the  refer¬ 
ence  trial.  Absolute,  as  opposed  to  relative, 
frequencies  are  presented  to  indicate  changes 
in  the  number  of  observations  comprising  each 
distribution,  as  well  as  how  those  deviations 
were  distributed.  Given  the  differences  in  total 
observations  between  distributions,  however, 
comparisons  should  emphasize  relative  shapes 
and  not  absolute  frequencies.  Under  baseline 
and  the  third  percentile  session,  most  devia¬ 
tions  were  negative.  This  was  not  surprising 
because  the  minimum  run  on  the  previous  trial 
was  eight  and  the  mean  run  at  this  time  was 
around  three  (see  Figure  1).  As  the  differen¬ 
tiation  progressed,  the  upper  tail  of  the  overall 
distribution  extended  to  include  more  positive 
deviations.  The  mode  ultimately  settled  at  —  1 


and  appeared  relatively  symmetric.  Deviations 
following  criterional  runs  between  eight  a.ad 
16  (middle  panel)  were  shifted  toward  nega¬ 
tive  deviations.  Uonverselv.  distributions  of  de¬ 
viations  following  noncriterional  runs  between 
eight  and  16  had  relativelv  larger  numbers  of 
positive  deviations,  with  a  mode  ol  0  and  + 1 
during  the  10th  and  25th  sessions  and  posi¬ 
tive!  v  displaced  .secondarv  modes  during  the 
10th,  25th.  and  50th  sessions. 


DISCUSSION 

.All  models  of  behavior  that  discount  the  in¬ 
fluence  of  local  reinforcement  contingencies  in 
deference  to  aggregate  relations  predict  that 
runs  should  have  remained  short  throughout 
this  studv.  because  such  runs  maximize  trial 
rate  and/or  minimize  the  number  of  responses 
per  reinforcer.  Maximizing  trial  rate  maxi¬ 
mizes  reinforcement  rate,  because  reinforce¬ 
ment  probability  per  trial  was  constant 
throughout.  Minimizing  responses  per  trial 
increases  reinforcement  probability  per  re¬ 
sponse  or  decreases  the  "price”  of  food  (cf. 
Hursh,  1980).  Each  of  these  is  easily  accom¬ 
plished  by  responding  once  on  the  left  lever 
and  then  switching  to  the  right  lever  to  end 
the  trial. 

Of  the  30  subjects  studied,  however,  the  be¬ 
havior  of  only  1  even  remotely  approached  this 
prediction.  \Iost  subjects  made  runs  longer 
than  one  under  both  the  baseline  and  the  per¬ 
centile  procedures.  No  doubt,  these  models 
could  be  modified  to  allow  the  variability  in¬ 
duced  by  intermittent  reinforcement  under  both 
procedures  to  predict  runs  longer  than  the  ;  b- 
solute  minimum,  but  this  cannot  account  for 
the  differential  results  under  the  two  proce¬ 
dures.  Under  the  nondifferential  baseline, 
when  there  was  no  local  contingency  with  re¬ 
spect  to  run  length,  subjects  approximated  the 
minimum  allowable  run  by  making  relatively 
short  runs.  But  subjects  in  the  present  study 
overwhelmingly  acquired  differentiated  re¬ 
sponding  when  the  targeted  percentile  proce¬ 
dure  was  instituted,  making  not  merely  longer 
runs  but  runs  in  the  vicinity  of  an  experimen¬ 
tally  defined  target,  even  though  doing  so  did 


under  the  targeted  percentile  procedure.  The  top  panel  presents  all  deviations,  and  the  bottom  panels  segregate  deviations 
depending  on  whether  the  previous  trial  ended  in  food.  See  figure  legend  for  session  identification.  Values  are  total 
frequencies  for  the  group. 
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not  increase  reinforcement  probabilitx  (either 
per  trial  or  per  response),  required  more  re¬ 
sponses  per  pellet,  and  resulted  either  in  the 
same  reinforcement  rate  (at  best)  or  often  se¬ 
verely  reduced  it.  Of  the  30  subjects  in  this 
study,  onlv  1  avoided  being  "trapped  "  bv  the 
percentile  schedule  into  emitting  a  response 
pattern  that  did  not  optimize  aggregate  rein¬ 
forcement  parameters.  Further,  the  present 
subjects  represent  onlv  the  most  recent  ones  to 
be  exposed  to  the  contingencies  described  here. 
Runs  of  over  100  subjects  have  now  been  dif¬ 
ferentiated  under  targeted  percentile  schedules 
like  the  present  one.  with  similar  results  (cf. 
(ialbicka.  Fowler.  &  Ritch.  1991:  (ialbicka. 
Kautz.  &  Ritch.  1902).  This  diH'erentiatctl  re¬ 
sponding  has  never  achieved  a  higher  overall 
r.ite  or  probabilitv  of  reinforcement,  .ind  bv 
definition  has  required  more  responses  per  re¬ 
inforcer.  These  characteristics  remain  true  .it 
■ill  levels  of  aggregation  examined,  from  dif¬ 
ferent  conditions,  over  sessions,  or  in  blocks  of 
as  few  as  20  trials.  .Adding  these  results  to 
those  obtained  with  other  percentile  proce¬ 
dures  that  differentiated  response  dimensions 
from  interresponse-time  duration  (e.g..  .Mle- 
man  &  Platt,  1973;  .Arbuckle  &  Lattal.  1992; 
Galbicka  «&  Platt,  1986;  Kuch  &  Platt.  1976), 
to  response  or  changeover  duration  (e  g.,  Platt. 
1984),  to  spatial  response  location  (e.g..  Gal¬ 
bicka  &  Platt,  1989;  .Scott  &  Platt,  1985),  to 
response  variabilitv  (Machado.  1989),  main¬ 
taining  either  a  constant  overall  reinforcement 
probabilitv  or  rate  throughout,  these  results 
present  a  challenge  to  models  of  behavior 
change  that  are  predicated  on  changes  in  ag¬ 
gregate  reinforcement  rate  or  probability.  This 
is  not  to  deny  that  such  factors,  if  varied,  pro¬ 
duce  systematic  changes  in  behavior.  But  sub¬ 
stantial  behavior  change  often  occurs  in  the 
absence  of  changes  in  these  reinforcement  di¬ 
mensions.  and  sometimes,  as  is  the  case  here, 
change  occurs  even  despite  unfavorable  changes 
in  reinforcement  density.  The  present  results 
indicate  that  aggregate  relations  should  not  be 
considered  fundamental  in  the  control  of  be¬ 
havior.  Rather,  they  probably  represent  the 
combined  effects  of  more  local  relations  that 
drive  behavior  change.  Although  it  was  rea¬ 
sonable  to  begin  attempting  to  quantify  be¬ 
havior  by  eliminating  sources  of  local  variation 
and  developing  models  of  the  relatively  ho¬ 
mogeneous  behavior  that  results  (like  respond¬ 
ing  under  constant-probability  variable-inter¬ 


val  schedules),  a  complete  model  of  behavior 
must  ultimately  be  able  to  account  for  behavior 
change  that  is  produced  both  by  changes  in 
overall  reinforcement  rates  and  in  more  local 
relations  like  the  one  programmed  by  percen¬ 
tile  schedules.  Perhaps  it  is  time  to  change 
strategies  and  attempt  to  model  the  local  dy¬ 
namics  of  responding  as  thev  are  related  to 
local  reinforcement  characteristics,  while 
keeping  as  a  linchpin  of  anv  such  model  the 
requirement  that  it  track  the  behavioral  effects 
ot  changing  aggregate  reinforcement  param¬ 
eters  as  well. 

The  present  studv  is  meant  more  to  provoke 
such  a  local  analvsis  than  to  provide  one.  Re- 
( ent  foravs  into  l>ehavioral  dvnamics.  includ¬ 
ing  models  based  on  the  sequential  structure 
of  respondinii  (e.tt..  l  lovert.  1992;  Palva.  1992) 
or  on  linear-svstems  analvsis  (e.g.,  McDowell, 
liass,  &  Kesscl.  1992)  suggest  potential  starts. 
That  subjects  arc  capable  of  discriminating 
sequential  structure  in  environmental  events 
as  well  as  in  behavior  should  come  as  no  sur¬ 
prise — the  areas  of  psychophysics  dealing  with 
topics  such  as  timing  (e.g..  Gibbon  &  .Allan, 
1984),  numerosity  (e.g.,  Gallistel.  1989),  and 
so  forth  are  replete  with  such  demonstrations. 
In  fact,  the  anchoring  of  behavior  around  tem¬ 
poral.  numerical,  spatial,  or  other  cues  differ- 
enticlly  correlated  with  reinforcement  is  so 
pervasive  that  models  incapable  of  providing 
for  such  correlation  must  be  considered  incom¬ 
plete  at  best.  A  viable  model  of  operant  be¬ 
havior  must  account  for  the  development  of 
behavioral  structure  as  it  is  warped  by  rein¬ 
forcement  and  the  environmental  events  that 
act  as  signposts  for  biologically  significant  con¬ 
sequences  (cf.  Killeen,  1992). 

Differentiating  response  number  under  tar¬ 
geted  percentile  schedules  may  reveal  a  greater 
role  for  sequential  dependencies  in  run  length 
because,  unlike  traditional  reinforcement 
schedules,  percentile  schedules  are  explicitly 
designed  to  operate  on  local  structure  in  re¬ 
sponding.  Paradoxically,  percentile  schedules 
keep  the  overall  probability  of  reinforcement 
constant  by  providing  a  maximal  transition  in 
reinforcement  probability  (from  0  to  1)  for 
behavior  relatively  closer  to  the  target.  Because 
the  reinforcement  contingency  is  based  on  the 
relation  between  current  and  recent  behavior, 
it  would  not  be  surprising  to  find  a  greater 
degree  of  sequential  structure  in  behavior  than 
that  reported  under  more  typical  free-operant 
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arrangements  (e.g.,  Palya,  1992;  Peavev,  Mc¬ 
Dowell,  &  Kessell,  1992).  The  oscillatory  pat¬ 
terns  in  run  length  here,  for  example,  suggest 
some  very  long-term  sequential  structure  with 
at  least  some  subjects. 

At  the  other  extreme,  the  data  on  the  de¬ 
viations  presented  in  Figure  5  suggest  a  dif¬ 
ferential  result  on  a  trial-bv-trial  liasis  de¬ 
pending  on  the  outcome  of  respondiim,  in  th.it 
deviations  following  food  presentation  were 
generally  more  likely  to  be  negative,  whereas 
those  following  trials  without  food  were  more 
often  positive.  This  contrasts  with  the  data  on 
deviations  presented  for  spatial  response  lo¬ 
cation  on  a  circular  dimension  ((lalbicka  ^ 
Platt.  1989'),  where  deviations  were  s^cneniilv 
centered  on  the  previous  response  location,  with 
minimal  dispersion  on  trials  followinu  lood 
and  greater  dispersion  on  those  lollowinu  no 
food.  Both  sets  of  data  suggest  that  reinforce¬ 
ment  increases  the  probabilitv  of  emitting  the 
behavior  most  recently  associated  with  food. 
In  the  spatial  situation,  this  involves  returning 
to  the  previous  location;  here,  it  involves  press¬ 
ing  the  right  lever,  but  that  in  turn  means 
prematurely  ending  the  current  run. 

This  analysis  emphasizes  that  acquisition  of 
differentiated  runs  requires  acquisition  and 
extinction  of  several,  sometimes  opposing,  op¬ 
erants,  and  a  dynamic  model  of  such  acqui¬ 
sition  should  make  this  explicit.  First,  subjects 
must  learn  to  press  the  right  lever,  because 
responses  there  terminate  the  trial  and  are  most 
closely  followed  by  food.  But  pressing  the  right 
lever  alone  must  ultimately  undergo  extinc¬ 
tion,  because  only  right-lever  presses  following 
at  least  one  left-lever  press  produce  any  con¬ 
sequences.  So  left-lever  pressing  is  differen¬ 
tially  reinforced  and  increases.  But  there  are 
upper  limits  to  the  amount  of  left-lever  press¬ 
ing,  imposed  both  by  the  percentile  schedule, 
which  begins  reinforcing  shorter  runs  differ¬ 
entially  as  comparison  runs  become  increas¬ 
ingly  long  above  the  target,  and  by  the  inherent 
delay  to  food  or  increased  effort  involved  in 
completing  a  longer  run.  The  tendency  for 
runs  to  stabilize  asymmetrically  at  values 
slightly  below  the  target  most  likely  reflects 
the  opposition  of  the  differential  reinforcement 
provided  by  the  percentile  schedule  with  that 
associated  with  completing  a  run  (cf.  Platt, 
1984). 

There  remain  higher  order  dynamics  that 
might  differentiate  even  more  complex  oper¬ 


ants  in  the  present  situation.  The  percentile 
schedule  provides  reinforcement  differentially 
for  deviations  towards  a  target,  not  for  a  par¬ 
ticular  run  per  se,  and  may  therefore  establish 
ileviation  as  an  operant.  Hence,  a  model  of 
behavior  in  the  present  studv  might  need  to 
consider  not  only  the  run  reinforced  on  a  par¬ 
ticular  trial  but  also  the  directional  change  in 
behavior  from  trial  to  trial  when  reinforcement 
was  delivered.  .Similar  suggestions  have  been 
oriered  in  the  past;  Skinner  (1938)  suggested 
that  response  number  within  a  fixed  interval 
(ould  be  differentiated.  Zeiler  and  colleagues 
demonstrated  that  the  time  to  complete  a  fixed 
ratio  is  an  operant  (see  Zeiler,  1977.  for  a 
review  rand  .Silberbergand  Ziriax  ( 1982)  suti- 
i^ested  that  concurrent-schedule  performance 
is  best  understood  not  in  terms  of  individual 
kev  peeks  but  in  terms  of  the  differential  re¬ 
inforcement  provided  for  changing  between 
s(  hedulcs.  These  suggestions  all  emphasize  that 
.ispecis  of  behavior  other  than  single  jircsses 
tan  be  conditioned;  the  percentile  procedure 
used  here  makes  this  even  more  evident  by 
establishing  reinforcement  contingencies  for 
appropriate  deviation. 

The  present  results,  therefore,  pose  a  quan¬ 
dary  to  existing  quantitative  models  of  operant 
behavior.  These  models  presume  that  behavior 
matches,  maximizes,  or  is  otherwise  controlled 
by  some  aspect  of  aggregate  reinforcement  pa¬ 
rameters  that  yield  some  overall  benefit  to  the 
subject  (or  at  least  do  not  worsen  its  lot).  Yet 
it  is  difficult  to  see  how  behavior  of  the  subjects 
in  the  present  study  could  be  construed  as  pro¬ 
viding  any  benefit,  except  in  the  short  term 
(i.e.,  on  the  next  trial).  The  percentile  schedule 
used  here  drives  aggregate  reinforcement  pa¬ 
rameters  away  from  any  long-term  optimum, 
in  a  sense  by  placing  long-term  and  short-term 
goals,  or  aggregate  versus  local  reinforcement, 
in  opposition.  It  makes  it  difficult  for  subjects 
to  keep  doing  what  they  were  doing  under 
baseline  by  offering  an  immediate  incentive  for 
doing  something  different  (i.e.,  repeating  a  run 
that  currently  dominates  the  memory  will  be 
reinforced  with  probability  n',  but  moving  one 
step  closer  to  the  target  will  always  produce 
reinforcement).  The  percentile  schedule  is,  in 
effect,  a  socialist  version  of  capitalism  realized, 
in  that  it  guarantees  a  fixed  probability  of  re¬ 
inforcement  independent  of  performance  while 
at  the  same  time  providing  incentive  for  be¬ 
havior  change.  (My  thanks  to  G.  Jean  Kant 
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for  this  interesting  analogy.)  .Although  overall 
reinforcement  probability  remains  constant,  the 
promise  of  reinforcement  on  the  next  trial  drives 
continuous  improvement.  Prosperity  remains 
ever  around  the  corner,  yet  never  appears. 
Viewed  in  this  way,  differentiated  responding 
represents  a  lack  of  self-control,  in  that  suc¬ 
cumbing  to  local  reinforcement  contingencies 
drives  overall  reinforcement  density  down  (cf. 
Logue,  19881.  But  it  could  as  easily  be  argued 
that  differentiated  responding  demonstrates  self- 
control,  and  not  a  lack  thereof,  because  right- 
lever  presses  must  increasingly  be  delayed  for 
reinforcement  on  the  next  trial.  Therein  lies 
the  quandary. 
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